160
Applications in Computer Vision
Given a conventional FC layer, we denote wi ∈Rmi and ai ∈RCi as its weights and
features in the i-th layer, where mi = Ci × Ci−1. Ci represents the number of output
channels of i-th layer. Then we have the following.
ai = ai−1 ⊗wi,
(6.40)
where ⊗denotes full-precision multiplication. As mentioned above, the BNN model aims
to binarize wi and ai into PR→B(wi) and PR→B(ai). For simplification, in this chapter we
denote PR→B(wi) and PR→B(ai) as bwi ∈Bmi and bai ∈BCi in this paper, respectively.
Then, we use the efficient XNOR and Bit-count operations to replace full-precision opera-
tions. Following [199], the forward process of the BNN is
ai = bai−1 ⊙bwi,
(6.41)
where ⊙represents efficient XNOR and Bit-count operations. Based on XNOR-Net [199],
we introduce a learnable channel-wise scale factor to modulate the amplitude of real-valued
convolution. Aligned with the Batch Normalization (BN) and activation layers, the process
is formulated as
bai = sign(Φ(αi ◦bai−1 ⊙bwi)),
(6.42)
where we divide the data flow in POEM into units for detailed discussions. In POEM, the
original output feature ai is first scaled by a channel-wise scale factor (vector) αi ∈RCi
to modulate the amplitude of its full-precision counterparts. It then enters Φ(·), which
represents a composite function built by stacking several layers, e.g., the BN layer, the non-
linear activation layer, and the max-pooling layer. Then the output is binarized to obtain
the binary activations bai ∈BCi, through the sign function. sign(·) denotes the sign function
that returns +1 if the input is greater than zeros and −1 otherwise. Then, 1-bit activation
bai can be used for efficient XNOR and Bit-count of the (i+1)-th layer.
6.3.3
Supervision for POEM
To constrain Bi-FC to have binarized weights with amplitudes similar to their real-valued
counterparts, we introduce a new loss function in our supervision for POEM. We consider
that unbinarized weights should be reconstructed based on binarized weights, as revealed
in Eq. 6.38. We define the reconstruction loss according to Eq. 6.38 as
LR = 1
2∥wi −αi ◦bwi∥2
2,
(6.43)
where LR is the reconstruction loss. Taking into account the impact of αi on the layer
output, we define the learning objective of our POEM as
arg min
{wi,αi,pi},∀i∈N
LS(wi, αi, pi) + λLR(wi, αi),
(6.44)
where pi denotes the other parameters of real-valued layers in the network, e.g., BN layer,
activation layer, and unbinarized fully-connected layer. N denotes the number of layers in
the network. LS is the cross entropy.
And λ is a hyperparameter. Unlike binarization methods (such as XNOR-Net [199] and
Bi-Real Net [159]) where only the reconstruction loss is considered in the weight calculation.
By fine-tuning the value of λ, our proposed POEM can achieve much better performance
than XNOR-Net, which shows the effectiveness of combined loss against only softmax loss.
Our discrete optimization method comprehensively calculates the Bi-FC layers considering
the reconstruction loss and the softmax loss in a unified framework.